neural text
Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective
Uchendu, Adaku, Le, Thai, Lee, Dongwon
Two interlocking research questions of growing interest and importance in privacy research are Authorship Attribution (AA) and Authorship Obfuscation (AO). Given an artifact, especially a text t in question, an AA solution aims to accurately attribute t to its true author out of many candidate authors while an AO solution aims to modify t to hide its true authorship. Traditionally, the notion of authorship and its accompanying privacy concern is only toward human authors. However, in recent years, due to the explosive advancements in Neural Text Generation (NTG) techniques in NLP, capable of synthesizing human-quality openended texts (so-called "neural texts"), one has to now consider Figure 1: The figure illustrates the quadrant of research problems authorships by humans, machines, or their combination. Due where (1) the GRAY quadrants are the focus of this survey, to the implications and potential threats of neural texts when and (2) The BLACK box indicates the specialized binary AA problem used maliciously, it has become critical to understand the limitations to distinguish neural texts from human texts. of traditional AA/AO solutions and develop novel AA/AO solutions in dealing with neural texts. In this survey, therefore, we make a comprehensive review of recent literature on the attribution released (e.g., FAIR [16, 82], CTRL [59], PPLM [25], T5 [94], Wu-and obfuscation of neural text authorship from a Data Dao